CS224N Final Project: Movie Title Recognition in E-Mails

نویسنده

  • Albert Lai
چکیده

For this project, a system was designed to rst identify whether or not an email mentioned a movie, and if it does, to extract the title, time and date of the movie in question. The classi er used to determine whether an email is a movie or non-movie is an extension of the Naive Bayes classi er. The classi er was fairly successful in terms of precision, although it tended to yield a lot of false positives. A named entity recognition system using MEMM was utilized to tag movie titles, locations, addresses, dates, and times. The NER system saw much less success than the classi er, although some labels like 'time' did fairly well. The dataset used to train both the classi er and the NER turned out to be fairly small so with more training data the system could see some improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CS224n Final Project

I introduce a novel method for disambiguating word senses using a semisupervised approach. I contrast this method with the current state-of-the-art approaches and show that my approach performs well and could potentially lead to fully unsupervised approaches with high accuracy.1

متن کامل

E-politeness in Iranian English Electronic Requests to the Faculty

This paper reports the findings of a study designed to investigate English e-requestsof Iranian EFL postgraduate students (i.e., nonnative speakers of English) made totheir professors during their education at Islamic Azad University, Najaf AbadBranch, Isfahan, Iran, to find out types of politeness features employed in the students’e-mails and the extent to which these features might influence ...

متن کامل

Managing Personal Information by Automatic Titling of E-mails

This paper presents an approach that enables automatic titling of e-mails relying on the morphosyntactic study of real titles. Automatic titling of e-mails has two interests: Titling mails ’no object’ and managing personal information. The method is developed in three stages: Candidate sentences determination for titling, noun phrases extraction in the candidate sentences, and finally, selectin...

متن کامل

Personal Semantic Data

This paper presents an approach that enables automatic titling of e-mails relying on the morphosyntactic study of real titles. Automatic titling of e-mails has two interests: Titling mails ’no object’ and managing personal information. The method is developed in three stages: Candidate sentences determination for titling, noun phrases extraction in the candidate sentences, and finally, selectin...

متن کامل

An Efficient Two-phase Spam Filtering Method Based on E-mails Categorization

The e-mail’s header session usually contains important attributes such as e-mail title, sender’s name, sender’s email address, sending date, which are helpful to classification of e-mails. In this paper, we apply decision tree data mining technique to header’s basic attributes to analyze the association rules of spam e-mails and propose an efficient spam filtering method to accurately identify ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009